adding variable length attention to llama3 8b #2000

liangel-02 · 2025-11-07T01:12:31Z

Summary
This PR adds variable length attention (varlen) support to the Llama 3 8b model in torchtitan. We replace use_flex_attn with attn_type (either "sdpa", "varlen", "flex"). If attn_type = "varlen", the attention module calls a compiled varlen_attn defined here.

Testing
Ran loss and performance tests against flex attention. Loss is on par.

Varlen is slightly slower than Flex due to the cuda kernel speeds (varlen calls into flash_attention_forward/flash_attention_backward today).

	Varlen	Flex
Forward	774us 357ns	722us 317ns
Backward	1ms 955us 916ns	1ms 558us 747ns

torchtitan/hf_datasets/text_datasets.py

fegin

This implementation won't work with PP and too model intrusive. The pack logic should be hide inside the inner attention.

torchtitan/hf_datasets/text_datasets.py

torchtitan/models/llama3/train_configs/llama3_8b_varlen.toml

torchtitan/models/attention.py

torchtitan/models/llama3/__init__.py

fegin

LGTM, thanks for the update. Leave some other comments, after the comments are addressed, this PR should be ready.

torchtitan/models/attention.py

torchtitan/models/llama3/model/model.py

torchtitan/models/llama3/train_configs/llama3_8b.toml

tianyu-l

Thanks! Left some comments, please see if they make sense to you.

torchtitan/models/llama3/model/model.py

torchtitan/models/llama3/model/args.py

torchtitan/models/llama3/model/model.py

torchtitan/models/attention.py

tianyu-l

Left some more comments. If you'd like to focus on Llama 3 in this PR, that's fine with me too.

torchtitan/distributed/activation_checkpoint.py

torchtitan/experiments/forge/example_train.py

torchtitan/experiments/gpt_oss/infra/parallelize.py

torchtitan/experiments/simple_fsdp/deepseek_v3/parallelize.py

torchtitan/experiments/vlm/infra/parallelize.py

torchtitan/models/llama4/model/args.py

torchtitan/models/qwen3/infra/parallelize.py

torchtitan/models/llama4/model/model.py

torchtitan/models/qwen3/model/model.py

torchtitan/models/attention.py

fegin

LGTM, we can leave other models to other PR(s).

torchtitan/models/deepseek_v3/infra/parallelize.py

torchtitan/experiments/simple_fsdp/deepseek_v3/parallelize.py

tianyu-l · 2025-11-21T00:26:47Z

torchtitan/models/llama3/model/model.py

+                    xv,
+                    self.head_dim,
+                    attention_masks,
+                    is_causal=True,


This would fail? I think is_causal is no longer accepted.

Btw, it seems varlen is not tested in CI, can we add one test similar to https://github.com/pytorch/torchtitan/blob/main/tests/integration_tests/features.py#L336

tianyu-l

LGTM.

We need to modify save_list of SAC to save the result of varlen attn, to be consistent with other attn implementations. Can do this in next PR.

tianyu-l · 2025-11-21T22:44:09Z

tests/integration_tests/features.py

+            [
+                [
+                    "--parallelism.data_parallel_shard_degree=4",
+                    "--activation_checkpoint.mode='full'",


let's use per_op_sac like the test above.

This reverts commit f8fa21e.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 7, 2025

liangel-02 force-pushed the test_varlen branch 3 times, most recently from eeecb63 to cad97e5 Compare November 12, 2025 22:49

liangel-02 changed the title ~~Test varlen~~ adding variable length attention to llama 3 8b Nov 12, 2025

liangel-02 changed the title ~~adding variable length attention to llama 3 8b~~ adding variable length attention to llama3 8b Nov 12, 2025

liangel-02 requested a review from drisspg November 12, 2025 23:18

drisspg reviewed Nov 12, 2025

View reviewed changes

torchtitan/hf_datasets/text_datasets.py Outdated Show resolved Hide resolved

drisspg reviewed Nov 12, 2025

View reviewed changes

torchtitan/hf_datasets/text_datasets.py Outdated Show resolved Hide resolved

fegin requested changes Nov 13, 2025

View reviewed changes

torchtitan/hf_datasets/text_datasets.py Outdated Show resolved Hide resolved

torchtitan/hf_datasets/text_datasets.py Outdated Show resolved Hide resolved

liangel-02 force-pushed the test_varlen branch 4 times, most recently from 55352a5 to 066ca02 Compare November 14, 2025 18:11

liangel-02 requested a review from fegin November 14, 2025 18:11

liangel-02 marked this pull request as ready for review November 14, 2025 18:14

liangel-02 requested review from tianyu-l, wconstab and wwwjn as code owners November 14, 2025 18:14

wwwjn reviewed Nov 17, 2025

View reviewed changes

torchtitan/models/llama3/train_configs/llama3_8b_varlen.toml Outdated Show resolved Hide resolved

torchtitan/models/attention.py Outdated Show resolved Hide resolved

torchtitan/models/llama3/__init__.py Outdated Show resolved Hide resolved

liangel-02 force-pushed the test_varlen branch from 066ca02 to c9b6d5c Compare November 17, 2025 15:17

fegin requested changes Nov 17, 2025

View reviewed changes

torchtitan/models/attention.py Show resolved Hide resolved

torchtitan/models/llama3/model/model.py Outdated Show resolved Hide resolved

torchtitan/models/llama3/train_configs/llama3_8b.toml Outdated Show resolved Hide resolved

liangel-02 force-pushed the test_varlen branch 2 times, most recently from a902cbe to de416f9 Compare November 17, 2025 18:05

liangel-02 requested a review from fegin November 17, 2025 18:05

tianyu-l requested changes Nov 17, 2025

View reviewed changes

liangel-02 force-pushed the test_varlen branch 4 times, most recently from caafc81 to 4d36560 Compare November 18, 2025 21:49

drisspg reviewed Nov 19, 2025

View reviewed changes

torchtitan/models/attention.py Show resolved Hide resolved

drisspg reviewed Nov 19, 2025

View reviewed changes

torchtitan/models/attention.py Show resolved Hide resolved

liangel-02 force-pushed the test_varlen branch 4 times, most recently from 9380847 to 42c0c85 Compare November 19, 2025 22:33

liangel-02 requested review from fegin and tianyu-l November 19, 2025 22:34

tianyu-l reviewed Nov 20, 2025

View reviewed changes

liangel-02 force-pushed the test_varlen branch 4 times, most recently from 5528029 to 31c1c77 Compare November 20, 2025 17:35

fegin approved these changes Nov 20, 2025

View reviewed changes

torchtitan/models/deepseek_v3/infra/parallelize.py Show resolved Hide resolved

liangel-02 force-pushed the test_varlen branch 4 times, most recently from b717da3 to 9c99fcb Compare November 20, 2025 19:11

liangel-02 requested a review from tianyu-l November 20, 2025 19:46

tianyu-l reviewed Nov 21, 2025

View reviewed changes

remove use_flex for all other models

ab033dd

liangel-02 force-pushed the test_varlen branch 2 times, most recently from 1af38e5 to df22636 Compare November 21, 2025 16:45

liangel-02 requested a review from tianyu-l November 21, 2025 18:03

integration test

2b1a40f

liangel-02 force-pushed the test_varlen branch from df22636 to 2b1a40f Compare November 21, 2025 18:17

tianyu-l approved these changes Nov 21, 2025

View reviewed changes

tianyu-l merged commit f8fa21e into main Nov 21, 2025
10 of 12 checks passed

tianyu-l deleted the test_varlen branch November 21, 2025 22:46

kiansierra added a commit to kiansierra/torchtitan-modal that referenced this pull request Nov 22, 2025

Revert "adding variable length attention to llama3 8b (pytorch#2000)"

081cd35

This reverts commit f8fa21e.

adding variable length attention to llama3 8b #2000

adding variable length attention to llama3 8b #2000

Uh oh!

Conversation

liangel-02 commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fegin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fegin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fegin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tianyu-l Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

tianyu-l Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

liangel-02 commented Nov 7, 2025 •

edited

Loading